Memory-hierarchy optimal matrix multiplication-programs

نویسنده

Riko Jacob

چکیده

In many applications that work on huge data sets, from areas like bioinformatics, data-mining, network analysis, optimization, and simulation, the computation time is a major concern. To shorten this time, we have to take into account the many aspects that determine the running time of a program on a modern computer. One of these aspects is the range of different type of memory, from fast but small to huge but slow. More precisely, there are registers on the CPU, various kinds of caches, main memory and disk, the levels of the memory hierarchy. The data transfer between these levels happens by means of so called I/O-operations that can be very slow, and should be minimized to achieve fast programs. The core computation of many applications that require huge amounts of data can be modeled by sparse matrices, such that general purpose high performance software libraries for abstract operations can be used. One of the basic such operations is the multiplication of a huge sparse matrix with a vector. It has been recognized that this is one of the operations where modern computers operate significantly below their peak CPU-performance, indicating that indeed the data transfer in the memory hierarchy is the bottleneck. In very recent work, we showed a lower bound on the number of data transfers that are necessary to compute the product of an arbitrary sparse matrix with a vector, and a sorting-based algorithm that is asymptotically optimal. Fortunately, in many applications the sparse matrices do have a structure that can be exploited to perform the multiplication with fewer I/O’s. The focus of this project is to (automatically) analyze the structure of a huge sparse matrix with respect to the amount of data transfer that is required to multiply with this matrix. In other words, for a given sparse matrix A we are interested in a program that transforms a vector x into the product Ax with the fewest possible I/O-operations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Experimental Comparison of Cache-oblivious and Cache-aware Programs DRAFT: DO NOT DISTRIBUTE

Cache-oblivious algorithms have been advanced as a way of circumventing some of the difficulties of optimizing applications to take advantage of the memory hierarchy of modern microprocessors. These algorithms are based on the divide-and-conquer paradigm – each division step creates sub-problems of smaller size, and when the working set of a sub-problem fits in some level of the memory hierarch...

متن کامل

Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays — Part 1 (REVISED∗)

Communication, i.e., moving data, between levels of a memory hierarchy or between parallel processors on a network, can greatly dominate the cost of computation, so algorithms that minimize communication can run much faster (and use less energy) than algorithms that do not. Motivated by this, attainable communication lower bounds were established in [12, 13, 4] for a variety of algorithms inclu...

متن کامل

An Algebraic Approach to Cache Memory Characterization for Block Recursive Algorithms

Multiprocessor systems usually have cache or local memory in the memory hierarchy. Obtaining good performance on these systems requires that a program utilizes the cache ef-ciently. In this paper, we address the issue of generating eecient cache based algorithms from tensor product formulas. Tensor product formulas have been used for expressing block recursive algorithms like Strassen's matrix ...

متن کامل

Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1

متن کامل

Communication-Optimal Convolutional Neural Nets

Efficiently executing convolutional neural nets (CNNs) is important in many machinelearning tasks. Since the cost of moving a word of data, either between levels of a memory hierarchy or between processors over a network, is much higher than the cost of an arithmetic operation, minimizing data movement is critical to performance optimization. In this paper, we present both new lower bounds on d...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Memory-hierarchy optimal matrix multiplication-programs

نویسنده

چکیده

منابع مشابه

An Experimental Comparison of Cache-oblivious and Cache-aware Programs DRAFT: DO NOT DISTRIBUTE

Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays — Part 1 (REVISED∗)

An Algebraic Approach to Cache Memory Characterization for Block Recursive Algorithms

Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1

Communication-Optimal Convolutional Neural Nets

عنوان ژورنال:

اشتراک گذاری